759 research outputs found

    Weighted Heuristic Ensemble of Filters

    Get PDF
    Feature selection has become increasingly important in data mining in recent years due to the rapid increase in the dimensionality of big data. However, the reliability and consistency of feature selection methods (filters) vary considerably on different data and no single filter performs consistently well under various conditions. Therefore, feature selection ensemble has been investigated recently to provide more reliable and effective results than any individual one but all the existing feature selection ensemble treat the feature selection methods equally regardless of their performance. In this paper, we present a novel framework which applies weighted feature selection ensemble through proposing a systemic way of adding different weights to the feature selection methods-filters. Also, we investigate how to determine the appropriate weight for each filter in an ensemble. Experiments based on ten benchmark datasets show that theoretically and intuitively adding more weight to ‘good filters’ should lead to better results but in reality it is very uncertain. This assumption was found to be correct for some examples in our experiment. However, for other situations, filters which had been assumed to perform well showed bad performance leading to even worse results. Therefore adding weight to filters might not achieve much in accuracy terms, in addition to increasing complexity, time consumption and clearly decreasing the stability

    Retinal vessel segmentation using Gabor Filter and Textons

    Get PDF
    This paper presents a retinal vessel segmentation method that is inspired by the human visual system and uses a Gabor filter bank. Machine learning is used to optimize the filter parameters for retinal vessel extraction. The filter responses are represented as textons and this allows the corresponding membership functions to be used as the framework for learning vessel and non-vessel classes. Then, vessel texton memberships are used to generate segmentation results. We evaluate our method using the publicly available DRIVE database. It achieves competitive performance (sensitivity=0.7673, specificity=0.9602, accuracy=0.9430) compared to other recently published work. These figures are particularly interesting as our filter bank is quite generic and only includes Gabor responses. Our experimental results also show that the performance, in terms of sensitivity, is superior to other methods

    Identification and Characterization of Novel Kinases that Regulate BRCA1 Expression and Function

    Get PDF
    Transcriptional and functional regulation of the breast cancer susceptibility gene 1 (BRCA1) in the pathogenesis of sporadic breast cancers is poorly understood. We developed a functional assay, which assesses the ability of BRCA1 to localize to sites of DNA damage and form ionizing radiation-induced foci (IRIF), to screen a kinase siRNA library and thirty-two potential positive regulators of BRCA1 were identified. Subsequent validation resulted in fourteen kinases that consistently diminished BRCA1 IRIF. Secondary screening assays for three selected kinases determined whether siRNA-mediated knockdown of the kinases caused an expression or function defect of BRCA1. Repair capacity and cell survival after DNA damage were characterized following siRNA-mediated knockdown of these three kinases. Our long term goal is to describe signaling pathways that explain how the identified kinases are able to regulate BRCA1. This knowledge could potentially translate into a novel therapeutic approach for sporadic breast cancers expressing low levels of BRCA1

    Applications of biophysical methods in small-molecule modulators targeting protein function

    Get PDF
    The research of Wenjia Wang described in her thesis covers a combination of biophysical methods in investigating the small-molecule modulator targeting protein function. One of them, an important application mentioned in the structure-based drug design (SBDD). SBDD takes advantages of the knowledge and basic understanding of the 3D structure of biomolecules (from X-ray Crystallography, Cryo-EM or nuclear magnetic resonance) to design, potential binders (small-molecules, peptides, or Antibodies).In her thesis, she started with a review on known artificial macrocycles 63O/P/Q targeting human IL17A. Based on the known structures, an amino acid derives scaffold was designed and screened by microscale thermophoresis (MST) and differential scanning fluorimetry(DSF). Furthermore, a novel small molecule series was discovered by a combination of X-ray Crystallography and MST rely on the previous experience. Additionally, she worked on solving other small-molecule modulators with target proteins including the structure of pyridoxal kinase from Plasmodium falciparum (Pf Pdxk) complex with AMP-PNP and PL and bovine Carbonic anhydrase II (bCAII) with ligand-directed diazo transfer probes. In all, her thesis described a story in how multiple biophysical methods contributed to small-molecule modulators research, especially in structure-based drug design

    On Prediction Properties of Kriging: Uniform Error Bounds and Robustness

    Full text link
    Kriging based on Gaussian random fields is widely used in reconstructing unknown functions. The kriging method has pointwise predictive distributions which are computationally simple. However, in many applications one would like to predict for a range of untried points simultaneously. In this work we obtain some error bounds for the (simple) kriging predictor under the uniform metric. It works for a scattered set of input points in an arbitrary dimension, and also covers the case where the covariance function of the Gaussian process is misspecified. These results lead to a better understanding of the rate of convergence of kriging under the Gaussian or the Mat\'ern correlation functions, the relationship between space-filling designs and kriging models, and the robustness of the Mat\'ern correlation functions

    Multi-Resolution Functional ANOVA for Large-Scale, Many-Input Computer Experiments

    Full text link
    The Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. We propose a multi-resolution functional ANOVA model as a computationally feasible emulation alternative. More generally, this model can be used for large-scale and many-input non-linear regression problems. An overlapping group lasso approach is used for estimation, ensuring computational feasibility in a large-scale and many-input setting. New results on consistency and inference for the (potentially overlapping) group lasso in a high-dimensional setting are developed and applied to the proposed multi-resolution functional ANOVA model. Importantly, these results allow us to quantify the uncertainty in our predictions. Numerical examples demonstrate that the proposed model enjoys marked computational advantages. Data capabilities, both in terms of sample size and dimension, meet or exceed best available emulation tools while meeting or exceeding emulation accuracy

    Clustering ensemble method

    Get PDF
    A clustering ensemble aims to combine multiple clustering models to produce a better result than that of the individual clustering algorithms in terms of consistency and quality. In this paper, we propose a clustering ensemble algorithm with a novel consensus function named Adaptive Clustering Ensemble. It employs two similarity measures, cluster similarity and a newly defined membership similarity, and works adaptively through three stages. The first stage is to transform the initial clusters into a binary representation, and the second is to aggregate the initial clusters that are most similar based on the cluster similarity measure between clusters. This iterates itself adaptively until the intended candidate clusters are produced. The third stage is to further refine the clusters by dealing with uncertain objects to produce an improved final clustering result with the desired number of clusters. Our proposed method is tested on various real-world benchmark datasets and its performance is compared with other state-of-the-art clustering ensemble methods, including the Co-association method and the Meta-Clustering Algorithm. The experimental results indicate that on average our method is more accurate and more efficient
    • …
    corecore